Acronym Disambiguation Using Word Embedding

نویسندگان

  • Chao Li
  • Lei Ji
  • Jun Yan
چکیده

According to the website AcronymFinder.com which is one of the world's largest and most comprehensive dictionaries of acronyms, an average of 37 new human-edited acronym definitions are added every day. There are 379,918 acronyms with 4,766,899 definitions on that site up to now, and each acronym has 12.5 definitions on average. It is a very important research topic to identify what exactly an acronym means in a given context for document comprehension as well as for document retrieval. In this paper, we propose two word embedding based models for acronym disambiguation. Word embedding is to represent words in a continuous and multidimensional vector space, so that it is easy to calculate the semantic similarity between words by calculating the vector distance. We evaluate the models on MSH Dataset and ScienceWISE Dataset, and both models outperform the state-of-art methods on accuracy. The experimental results show that word embedding helps to improve acronym disambiguation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A (acronyms)

Acronyms are a significant and the most dynamic area of the lexicon of many languages. Building automated acronym systems poses two problems: acquisition and disambiguation. Acronym acquisition is based on the identification of anaphoric or cataphoric expressions which introduce the meaning of an acronym in text; acronym disambiguation is a word sense disambiguation task, with expansions of an ...

متن کامل

Acronym-Expansion Disambiguation for Intelligent Processing of Enterprise Information

An acronym is an abbreviation of several words in such a way that the abbreviation itself forms a pronounceable word. Acronyms occur frequently throughout various documents, especially those of a technical nature, for example, research papers and patents. While these acronyms can enhance document readability, in a variety of fields, they have a negative effect on business intelligence. To resol...

متن کامل

Managing the Acronym/Expansion Identification Process for Text-Mining Applications

This paper deals with an acronym/definition extraction approach from textual data (corpora) and the disambiguation of these definitions (or expansions). Both steps of our global process of acquisition and management of acronyms are precisely described. The first step consists in using markers such as brackets to identify expansion candidates. The alignment of the letters allows to select the ac...

متن کامل

Using the Web to Disambiguate Acronyms

This paper proposes an automatic method for disambiguating an acronym with multiple definitions, considering the context surrounding the acronym. First, the method obtains the Web pages that include both the acronym and its definitions. Second, the method feeds them to the machine learner. Cross-validation tests results indicate that the current accuracy of obtaining the appropriate definition ...

متن کامل

Clinical Abbreviation Disambiguation Using Neural Word Embeddings

This study examined the use of neural word embeddings for clinical abbreviation disambiguation, a special case of word sense disambiguation (WSD). We investigated three different methods for deriving word embeddings from a large unlabeled clinical corpus: one existing method called Surrounding based embedding feature (SBE), and two newly developed methods: Left-Right surrounding based embedding...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015